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About IBM Content Manager OnDemand for iSeries Common 
Server Indexing Reference (SC27-1160) 


This book contains information about indexing methods, preparing index data, and 
using tools to index reports that you plan to store in and retrieve from IBM 
Content Manager OnDemand for iSeries Common Server Version 5 Release 2 
(OnDemand). 


Who should read this book 


This book is of primary interest to administrators and other people in an 
organization who are responsible for preparing data to be stored in OnDemand. 


How this book is organized 


This book is organized in the following parts. Each part contains information about 
one of the indexing tools provided with OnDemand: 


° [Part 1, “OS/400 indexer reference” on page 1{explains how to use the 
administrative client graphical tool to define the index criteria that the OS/400® 
indexer uses to locate and create index data for your spooled files. 


° |Part 2, “PDF indexer reference” on page 3] describes how to use the OnDemand 


PDF Indexer to generate index data for Adobe PDF files 


¢ |Part 3, “Generic indexer reference” on page 39| describes how to use the 


OnDemand Generic Indexer to specify index data for other types of input data 


| Prerequisite and related information 


| Use the IBM iSeries Information Center as your starting point for looking up 
| iSeries technical information. 


You can access the Information Center two ways: 
* From the following Web site: http://www. ibm.com/eserver/iseries/infocenter 
* From CD-ROMs that ship with your Operating System/400® order: 

iSeries Information Center, SK3T-4091-02. This package also includes the PDF 


versions of iSeries manuals, iSeries Information Center: Supplemental Manuals, 
SK3T-4092-01, which replaces the Softcopy Library CD-ROM. 


| The Information Center contains advisors and important topics such as Java’, 

| TCP/IP, Web serving, secured networks, logical partitions, clustering, CL 

| commands, and system application programming interfaces (APIs). It also includes 
| links to related IBM Redbooks™ and Internet links to other IBM Web sites such as 
| the IBM home page. 


Other information available on the World Wide Web 


More iSeries information is available on the World Wide Web. You can access 
general information from the iSeries home page, which is at the following Web 
site: http: //www-1.ibm.com/servers/eserver/iseries/ 


| To access workshops on advanced iSeries functions, use the Technical Studio, 
| located at: http://www.iseries.ibm.com/tstudio/ 
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Worldwide, you can read about, select, order and take delivery of iSeries program 
temporary fixes (PTF) over the Internet. iSeries Internet PTFs (downloads) and 
Preventive Service Planning (PSP) information are available at the following 
Internet location: http://as400service.ibm.com 


iSeries Navigator 


IBM iSeries Navigator is a powerful graphical interface for managing your iSeries 
servers. iSeries Navigator functionality includes system navigation, configuration, 
planning capabilities, and online help to guide you through your tasks. iSeries 
Navigator makes operation and administration of the server easier and more 
productive and is the only user interface to the new, advanced features of the 
OS/400. It also includes Management Central for managing multiple servers from 
a central system. 


You can find more information on iSeries Navigator in the IBM iSeries Information 
Center and at the following Web site: 
http://www. ibm.com/eserver/iseries/navigator/ 


How to send your comments 


vi 


Your feedback is important in helping to provide the most accurate and 
high-quality information. Please send any comments that you have about this 
publication. 


* If you prefer to send comments by FAX, use either of the following numbers: 
— United States, Canada, and Puerto Rico: 1-800-937-3430 
— Other countries: 1-507-253-5192 
* If you prefer to send comments electronically, use one of these e-mail addresses: 
— Comments on books: RCHCLERK@us.ibm.com 
— The publication number of a book 
— The page number or topic of a book to which your comment applies 
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| Summary of changes 


This edition of IBM Content Manager OnDemand for iSeries Common Server: Indexing 
Reference contains new technical information. There may be some instances where 
changes were made, but change bars are missing. Significant changes to note are: 


You can automate the loading of non-spooled file data such as PC files in IFS with 
the Start Monitor for OnDemand (STRMONOND) command using *DIR (directory) 
for the TYPE parameter. See Appendix A of the IBM Content Manager OnDemand for 
iSeries Common Server: Administration Guide for information on the STRMONOND 
command. 


Additional keywords have been added to many OnDemand commands to more 
precisely identify the spooled file that the command will use. The new keywords 
correspond to the same new keywords available for OS/400 spooled file 
commands, allowing you to specify the system on which the spooled file was 
created, as well as the spooled file creation date and time. 


Portable Application Solutions Environment (PASE), a product option of OS/400, is 
now an optional software prerequisite for the OnDemand Common Server. PASE is 
required if you plan to use the new OnDemand Common Server text search 
function for AFPDS documents. It is also possible that, in the future, other new 
functions of OnDemand may require PASE. 


Additions and enhancements have been made to the sample programs for both 
Common Server and Spool File Archive. Sample programs for Common Server can 
be found in QSAMPLES2 source file in library QRDARS. Sample programs for 
Spool File Archive can be found in QSAMPLES source file in library QRDARS. 


Record Archive provides commands and application programming interfaces 
(APIs) that let you store and retrieve data records on optical media for users who 
only require occasional access to historical data. At Version 5 Release 2, this 
product option is provided for existing Record Archive customers to use, but there 
are no planned enhancements. Documentation can be found in OnDemand 
publications from previous releases. Please talk to your software provider about 
other options, such as compressed DASD. 
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Part 1. OS/400 indexer reference 


The OS/400 indexer is the most common OnDemand indexer used for OS/400 
spooled files. It is called by the ADDRPTOND command for SCS, SCS-extended, 
Advanced Function Presentation” (AFP"’), and LINE spooled files. You use the 
OnDemand administrative client’s graphical indexing tool to define the index 
criteria that the OS/400 indexer uses to locate and create index data for your 
spooled files. 


The graphical tool can be invoked in one of two ways: 
* By clicking the Select Sample Data button within the Report Wizard, or 


* Selecting Sample Data and clicking the Modify button on the Indexer 
Information panel while creating an Application 


OnDemand will use this OS/400 indexer by default for SCS, SCS-extended, AFP, 
and LINE spooled files. See the Report Wizard section in the Introduction of the 
IBM Content Manager OnDemand for iSeries Common Server: Administration Guide for 
more information on the Report Wizard. See the section on Adding the Application 
in the Examples chapter of the IBM Content Manager OnDemand for iSeries Common 
Server: Administration Guide for more information on defining an application 
without using the Report Wizard. 
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Part 2. PDF indexer reference 


This part of the book provides information about the OnDemand PDF indexer. You 
can use the PDF indexer to extract index data from and generate index data about 
Adobe PDF files that you want to store in OnDemand. 
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Chapter 1. Overview 


What is the PDF indexer? 


The OnDemand PDF indexer is a program that you can use to extract index data 
from and generate index data about Adobe PDF input files. The index data can 
enhance your ability to store, retrieve, and view documents with OnDemand. The 
PDF indexer supports PDF Version 1.3 input and output data streams. For more 
information about the PDF data stream, see the Portable Document Format Reference 
Manual, published by Adobe Systems Incorporated. Adobe also provides online 
information with the Acrobat Exchange and Acrobat Distiller products, including 
online guides for Adobe Capture, PDFWriter, Distiller, and Exchange. 


You define and store PDF documents on the server using standard OnDemand 
functions. You must define an OnDemand application and application group. As 
part of the application, you must define the indexing parameters used by the PDF 
indexer to process input files. You can automate the indexing and loading of data 
by using special parameters of the ADDRPTOND (using *STMF for the INPUT 
parameter) or STRMONOND (using *DIR for the TYPE parameter) commands or 
the ARSLOAD API program. See the Command Reference appendix of the IBM 
Content Manager OnDemand for iSeries Common Server: Administration Guide for more 
information on the ADDRPTOND and STRMONOND commands. See the API 
Reference appendix of the IBM Content Manager OnDemand for iSeries Common 
Server: Administration Guide for more information on the ARSLOAD API program 
and its parameters. 


After you index and store input files in OnDemand, you use the OnDemand client 
program to view the PDF document or documents created during the indexing and 
loading process. You can also print pages of the PDF document you are viewing 
from the OnDemand client program. 


illustrates the process of indexing and loading PDF input files. 
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Figure 1. Processing PDF input files in OnDemand 


The PDF indexer processes PDF input files. A PDF file is a distilled version of a 
PostScript file, adding structure and efficiency. 


OnDemand retrieves processing information from application and application 
group definitions that are stored in the database. The application definition 
identifies the type of input data, the indexing program used to index the input 
files, the indexing parameters, and other information about the input data. The 
application group identifies the database and storage management characteristics 
of the data. You can use the administrative client to create the application and the 
indexing parameters. 


When OnDemand processes a PDF input file and the application Indexing 
Information page specifies PDF as the indexer, it automatically calls the PDF 
indexer to process the input file. The PDF indexer processes the PDF input file 
with indexing parameters that determine the location and attributes of the index 
data. The PDF indexer extracts index data from the PDF file and generates an 
index file and an output file. The output file contains groups of indexed pages. A 
group of indexed pages can represent the entire input file or, more typically, one or 
more pages from the input file. If the input file contains logical groups of pages, 
such as statements or policies, the PDF indexer can create an indexed group for 
each statement or policy in the input file. That way, users can retrieve a specific 
statement or set of statements, rather than the entire file. After indexing the data, 
OnDemand stores the index data in the database and the indexed groups on disk 
or archive storage volumes. 
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How OnDemand uses index information 


Every item stored in OnDemand is indexed with one or more group-level indexes. 
Groups are determined when the value of an index changes (for example, account 
number). When you load a PDF file into the system, OnDemand invokes the PDF 
indexer to process the indexing parameters and create the index data. OnDemand 
then loads the index data into the database, storing the group-level attribute values 
that the PDF indexing program extracted from the data into their corresponding 
database fields. [Figure illustrates the index creation and data loading process. 
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Figure 2. Indexing and loading data 


You typically create an application for each report that you plan to store in 
OnDemand. When you create an application, you define the indexing parameters 
that the indexing program uses to process the report and create the index data that 
is loaded into the database. For example, an INDEX parameter includes an 
attribute name and identifies the FIELD parameter that the indexing program uses 
to locate the attribute value in the input data. When you create an application, you 
must assign the application to an application group. The attribute name you 
specify on an INDEX parameter should be the same as the name of the application 
group database field into which you want OnDemand to store the index values. 


You define database fields when you create an application group. OnDemand 
creates a column in the application group table for each database field that you 
define. When you index a report, you create index data that contains index field 
names and index values extracted from the report. OnDemand stores the index 
data into the database fields. 


To search for reports stored in OnDemand, the user opens a folder. The search 
fields that appear when the user opens the folder are mapped to database fields in 
an application group (which, in turn, represent index attribute names). The user 
constructs a query by entering values in one or more search fields. OnDemand 
searches the database for items that contain the values (index attribute values) that 
match the search values entered by the user. Each item contains group-level index 
information. OnDemand lists the items that match the query. When the user selects 
an item for viewing, the OnDemand client program retrieves the selected item 
from disk or archive storage. 


| Processing PDF input files with the graphical indexer 


This section describes how to use the graphical indexer to create indexing 
information for PDF input files. 
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Important: If you plan to use the report wizard or the graphical indexer to process 
PDF input files, then you must first install Adobe Acrobat or Adobe 
Acrobat Approval on the PC from which you plan to run the 
administrative client. You must purchase Adobe Acrobat and Adobe 
Acrobat Approval from Adobe. 


OnDemand provides the ARSPDF32.API file to enable PDF viewing from 
the client. If you install the client after you install Adobe Acrobat, then 
the installation program will copy the API file to the Acrobat plug-in 
directory. If you install the client before you install Adobe Acrobat, 
then you must copy the API file to the Acrobat plug-in directory. Also, 
if you upgrade to a new version of Acrobat, then you must copy the 
API file to the new Acrobat plug-in directory. The default location of 
the API file is \Program Files\IBM\OnDemand32\PDF. The default 
Acrobat plug-in directory is \Program Files\Adobe\Acrobat 
x.y\Acrobat\Plug_ins, where x.y is the version of Acrobat, for 
example, 4.0, 5.0, and so forth. 


Beginning with Version 5.2, you can define indexing information in a visual 
environment. You begin by opening a sample input file with the graphical indexer. 
You can run the graphical indexer from the report wizard or by choosing the 
sample data option from the Indexing Information page of the application. After 
you open an input file in the graphical indexer, you define triggers, fields, and 
indexes. The PDF indexer uses the triggers, fields, and indexes to locate the 
beginning of a document in the input data and extract index values from the input 
data. Once you have defined the triggers, fields, and indexes, you can save them in 
the application so that OnDemand can use them later on to process the input files 
that you load into the system. 


You define a trigger, field, or index by drawing a box around a text string with the 
mouse and then specifying properties. For example, to define a trigger that 
identifies the beginning of a document, you could draw a box around the text 
string Account Number on the first page of a statement in the input file. Then, on 
the Add a Trigger dialog box, you would accept the default values provided, such 
as the location of the text string on the page. When processing an input file, the 
PDF indexer attempts to locate the specified string in the specified location. When 
a match occurs, the PDF indexer knows that it has found the beginning of a 
document. The fields and indexes are based on the location of the trigger. 


The PDF file that you open with the graphical indexer should contain a 
representative sample of the type of input data that you plan to load into the 
system. For example, the sample input file must contain at least one document. A 
good sample should contain several documents so that you can verify the location 
of the triggers, fields, and indexes on more than one document. The sample input 
file must contain the information that you need to identify the beginning of a 
document in the input file. The sample input file should also contain the 
information that you need to define the indexes. When you load an input file into 
the system, the PDF indexer will use the indexing information that you create to 
locate and extract index values for each document in the input file. 


The following example describes how to use the graphical indexer from the report 
wizard to create indexing information for an input file. The indexing information 
consists of a trigger that uniquely identifies the beginning of a document in the 
input file and the fields and indexes for each document. 


1. To begin, start the administrative client. 
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Log on to a server. 


Start the report wizard by clicking the Report Wizard icon on the toolbar. The 
report wizard opens the Sample Data dialog box. 


Click Select Sample Data to open the Open dialog box. Note: The Sample Data 
is limited to a PC file when using the graphical PDF indexer. The graphical 
PDF indexer is designed to work with workstation PDF files, not PDF spooled 
files in an output queue on the iSeries server. 


Type the name or full path name of a file in the space provided or use the 
Look in or Browse commands to locate a file. 


Click Open. The graphical indexer opens the input file in the report window. 


Press F1 to open the main help topic for the report window. The main help 
topic contains general information about the report window and contains links 
to other topics that describe how to add triggers, fields, and indexes. Under 
Options and Commands, click Indexer Information page to open the Indexing 
Commands topic. (You can also use the content help tool to display 
information about the icons on the toolbar.) Under Tasks, Indexer Information 
page, click Adding a trigger (PDF). 

Close any open help topics and return to the report window. 

Define a trigger. 


* Find a text string that uniquely identifies the beginning of a document. For 
example, Account Number, Invoice Number, Customer Name, and so forth. 


* Using the mouse, draw a box around the text string. Start just outside of 
the upper left corner of the string. Click and hold mouse button one. Drag 
the mouse towards the lower right corner of the string. As you drag the 
mouse, the graphical indexer uses a dotted line to draw a box. When you 
have enclosed the text string completely inside of a box, release the mouse 
button. The graphical indexer highlights the text string inside of a box. 


* Click the Define a Trigger icon on the toolbar to open the Add a Trigger 
dialog box. Verify the attributes of the trigger. For example, the text string 
that you selected in the report window should be displayed under Value; 
for Trigger1, the Pages to Search should be set to Every Page. Click Help for 
assistance with the other options and values that you can specify. 


* Click OK to define the trigger. 


* To verify that the trigger uniquely identifies the beginning of a document, 
first put the report window in display mode. Then click the Select tool to 
open the Select dialog box. Under Triggers, double click the trigger. The 
graphical indexer highlights the text string in the current document. Double 
click the trigger again. The graphical indexer should highlight the text 
string on the first page of the next document. Use the Select dialog box to 
move forward to the first page of each document and return to the first 
document in the input file. 

* Put the report window in add mode. 

Define a field and an index. 

* Find a text string that can be used to identify the location of the field. The 
text string should contain a sample index value. For example, if you want 


to extract account number values from the input file, then find where the 
account number is printed on the page. 

* Using the mouse, draw a box around the text string. Start just outside of 
the upper left corner of the string. Click and hold mouse button one. Drag 
the mouse towards the lower right corner of the string. As you drag the 
mouse, the graphical indexer uses a dotted line to draw a box. When you 
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have enclosed the text string completely inside of a box, release the mouse 
button. The graphical indexer highlights the text string inside of a box. 


* Click the Define a Field icon on the toolbar to open the Add a Field dialog 
box. 


* On the Field Information page, verify the attributes of the index field. For 
example, the text string that you selected in the report window should be 
displayed under Reference String; the Trigger should identify the trigger on 
which the field is based. Click Help for assistance with the options and 
values that you can specify. 


* On the Database Field Attributes page, verify the attributes of the database 
field. In the Database Field Name space, enter the name of the application 
group field into which you want OnDemand to store the index value. In the 
Folder Field Name space, enter the name of the folder field that will appear 
on the client search screen. Click Help for assistance with the other options 
and values that you can specify. 


* Click OK to define the field and index. 


To verify the locations of the fields, first put the report window in display 
mode. The fields should have a blue box drawn around them. Next, click 
the Select tool to open the Select dialog box. Under Fields, double-click 
Field 1. The graphical indexer highlights the text string in the current 
document. Double click Field 1 again. The graphical indexer should move 
to the next document and highlight the text string. Use the Select dialog 
box to move forward to each document and display the field. Then return 
to the first document in the input file. 


* Put the report window in add mode. 


Click the Display Indexer Parameters tool to open the Display Indexer 
Parameters dialog box. The Display Indexer Parameters dialog box lists the 
indexing parameters that the PDF indexer will use to process the input files 
that you load into the application. At a minimum, you need one trigger, one 
Held, anid one indesc So (Chapters, “Parameter eterence! on pape Aton 


details about the indexing parameters. 


When you have finished defining all of the triggers, fields, and indexes, close 
the report window. 


Click Yes to save the changes to the indexer parameters. 
On the Sample Data window, click Next to continue with the report wizard. 


Manually indexing input data 


Note: If you prefer creating your own PDF indexing parameters manually rather 


than using the graphical PDF indexer, you can use the instructions in the 
remainder of this chapter to do so. 


Indexing concepts 


Indexing parameters include information that allow the PDF indexer to identify 
key items in the print data stream, tag these items, and create index elements 
pointing to the tagged items. OnDemand uses the tag and index data for efficient, 
structured search and retrieval. You specify the index information that allows the 
PDF indexer to segment the data stream into individual items, called groups. A 
group is a collection of one or more pages, such as a bank statement, insurance 
policy, phone bill, or other logical segment of a report. The PDF indexer creates 
indexes for each group when the value of an index changes (for example, account 
number). 
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A tag is made up of an attribute name, for example, Customer Name, and an 
attribute value, for example, Earl Hawkins. Tags also include information that tell 
the PDF indexer where to locate the attribute value on a page. For example, a tag 
used to collect customer name index values provides the PDF indexer with the 
starting and ending position on the page where the customer name index values 
appear. The PDF indexer generates index data and stores it in a generic index file. 


Coordinate system 


The location of the text strings the PDF indexer uses to determine the beginning of 
a group and index values are described as x and y pairs in a coordinate system 
imposed on the page. For each text string, you identify its upper left and lower 
right position on the page. The upper left corner and lower right corner form a 
string box. The string box is the smallest rectangle that completely encloses the text 
string. The origin is in the upper left hand corner of the page. The x coordinate 
increases to the right and y increases down the page. You also identify the page on 
which the text string appears. For example, the text string Customer Name, that 
starts 4 inches to the right and 1 inch down and ends 5.5 inches to the right and 
1.5 inches down on the first page in the input file can be located as follows: 


ul(4,1),1r(5.5,1.5),1, ‘Customer Name' 


OnDemand provides the ARSPDUMP command to help you identify the locations 
of text strings on the page. 


Indexing parameters 
Processing parameters can contain index and conversion parameters, options, and 
values. For most reports, the PDF indexer requires at least three indexing 
parameters to generate index data: 
* TRIGGER 
The PDF indexer uses triggers to determine where to locate data. A trigger 
instructs the PDF indexer to look for certain information in a specific location on 
a page. When the PDF indexer finds the text string in the input file that contains 
the information specified in the trigger, it can begin to look for index 
information. 
— The PDF indexer compares words in the input file with the text string 
specified in a trigger. 
— The location of the trigger string value must be identified using the x, 
coordinate system and page offsets. 
— A maximum of 16 triggers can be specified. 
— All triggers must match before the PDF indexer can begin to locate index 
information. 
* FIELD 
The field parameter specifies the location of the data that the PDF indexer uses 
to create index values. 
— Field definitions are based on TRIGGER! by default, but can be based on any 
of 16 TRIGGER parameters. 
— The location of the field must be identified using the x,y coordinate system 
and page offsets. 
- Amaximum of 32 fields can be defined. 
— A field parameter can also specify all or part of the actual index value stored 
in the database. 
* INDEX 
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The index parameter is where you specify the attribute name and identify the 
field or fields on which the index is based. We strongly encourage you to name 
the attribute the same as the application group database field name. 


— The PDF indexer creates indexes for a group of one or more pages. 
— You can concatenate field parameters to form an index. 


— Amaximum of 32 index parameters can be specified. 


The PDF indexer creates a new group and extracts new index values when one 
or more of the index values change. 


depicts a portion of a page from a sample input file. We’ve enclosed the 
text strings that determine the beginning of a group and the index values in 


rectangles. 
« —0.75—»l0.25}¢ 1.00 ple —0.75—»le-0.50- 
v 
0.25 |Page 0001 
7 
0.75 
oa, 
0.25 John Smythe 
a) 123 Ubik Way North 
0.75 Meadow Bridge WV 99999-0000 
ane 2 
oe Statement Date: /|08/31/1996 
0.25 Account Number: |0000-3727-1644-0099 
0.50 
0.25 Balance: $1,096.54 
= 


Figure 3. Indexing data with the PDF indexer 


TRIGGER parameters tell the PDF indexer how to identify the beginning of a 
group in the input. The PDF indexer requires one TRIGGER parameter to identify 
the beginning of a group (statement) in the sample file. FIELD parameters 
determine the location of index values in a statement. Fields are based on the 
location of trigger records. INDEX parameters identify the attribute names of the 
index fields. Indexes are based on one or more field parameters. The following 
parameters could be used to index the report depicted in [Figure 3] See (Chapter 3] 
for details about the parameter syntax. 
* Define a trigger to search each page in the input data for the text string that 

identifies the start of a group (statement): 

TRIGGER1=ul (0,0), 1r(.75,.25),*,'Page 0001' 

* Define fields to identify the location of index data. For the sample report, we 

might define four fields: 

— FIELD1 identifies the location of customer name index values. 
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FIELD1=ul(1,1),1r(3.25,1.25) ,0 

— FIELD2 identifies the location of statement date index values. 
FIELD2=ul(2,2),1r(2.75,2.25) ,0 

— FIELD3 identifies the location of account number index values. 
FIELD3=ul (2,2.25) ,1r(3.25,2.5) ,0 

— FIELD4 identifies the location of the balance index values. 
FIELD4=ul (2,3),1r(2.75,3.25) ,0 


* Define indexes to identify the attribute name for an index value and the field 
parameter used to locate the index value. 


— INDEX1 identifies the customer name, for values extracted using FIELD1. 

INDEX1='cust_name',FIELD1 

— INDEX2 identifies the statement date, for values extracted using FIELD2. 

INDEX2='sdate' ,FIELD2 

— INDEX3 identifies the account number, for values extracted using FIELD3. 

INDEX3='acct_num',FIELD3 

— INDEX4 identifies the balance, for values extracted using FIELD4. 
INDEX4='balance',FIELD4 


How to create indexing parameters 


There are two parts to creating indexing parameters. First, process sample input 
data to determine the x,y coordinates of the text strings the PDF indexer uses to 
identify groups and locate index data. Then, create the indexing parameters using 
the administrative client. 


OnDemand provides the ARSPDUMP command to help you determine the 
location of trigger and field string values in the input data. The ARSPDUMP 
command processes one or more pages of sample report data and generates an 
output file. The output file contains one record for each text string on a page. Each 
record contains the x,y coordinates for a box imposed over the text string (upper 
left, lower right). 


The process works as follows: 
* Obtain a printed copy of the sample report. 
* Identify the string values that you want to use to locate triggers and fields 


* Identify the number of the page where each string value appears. The number is 
the sheet number, not the page identifier. The sheet number is the order of the 
page as it appears in the file, beginning with the number 1 (one), for the first 
page in the file. A page identifier is user-defined information that identifies each 
page (for example, iv, 5, and 17-3). 

* Process one or more pages of the report with the ARSPDUMP command. 


* In the output file, locate the records that contain the string values and make a 
note of the x,y coordinates. 

* Create TRIGGER and FIELD parameters using the x,y coordinates, page number, 
and string value. 


Indexing parameters are part of the OnDemand application. The administrative 


client provides an edit window you can use to maintain indexing parameters for 
the application. 
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Chapter 2. System considerations 


System limitations 


If you are using the OnDemand PDF indexer to generate index data for PostScript 
and PDF files that are created by user-defined programs, you need to keep the 
following in mind: 


* The PDF indexer can process PDF input files that are up to 2 GB in size 
* The PDF indexer supports DBCS languages. However, IBM does not provide any 


DBCS fonts. You can purchase DBCS fonts from Adobe. The PDF indexer 
supports all DBCS fonts, except encrypted Japanese fonts. 


* Input data delimited with PostScript Passthrough markers cannot be indexed 


* The Adobe Toolkit does not validate link destinations or bookmarks to other 
pages in a document or to other documents. Links or bookmarks may or may 
not resolve correctly, depending on how you segment your documents. 


* If a font is referenced but not embedded in a PDF file, the Acrobat viewing 
software attempts to find the font using information contained in the PDF font 
descriptor. If the Acrobat viewing software finds the font, it uses the font to 
display the text. If the Acrobat viewing software does not find the font, it 
displays the text using a substitute Type 1 font. 


Input data requirements 


The PDF indexer processes PDF input data. PostScript data generated by 
applications must be processed by Acrobat Distiller before you run the PDF 
indexer. The online documentation provided with Acrobat Distiller describes 
methods you can use to generate PDF data. 


If you plan to automate the data indexing and loading process on the OnDemand 
server, the input file name must identify the application group and application to 
load. Use the following convention to name your input files: 


MVS. JOBNAME. DATASET. FORMS . YYDDD.HHMMSST . PDF 


By default, the ARSLOAD program uses the FORMS part of the filename to identify 
the application group to load. You can use the -G parameter to specify a different 
part of the filename that identifies the application group. For example, arsload -G 
JOBNAME. If the application group contains more than one application, you must 
identify the application to load. Otherwise the load will fail. For example, to use 
the DATASET part of the filename to identify the application, run the ARSLOAD 
program with the -A DATASET parameter. Choose one of the MVS"™, JOBNAME, 
DATASET, and FORMS parts of the filename to identify the application group and 
application. 


Note: The case of the identifier PDF is ignored. Application group and application 
names are case sensitive and may include special characters such as the 
blank character. 


NLS considerations 
DBCS languages are not currently supported by the PDF indexer. 
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Data values that you specify on TRIGGER and FIELD parameters must be encoded 
in the same code page as the document. For example, if the characters in the 
document are encoded in code page 500, any data values that you specify on 
TRIGGER and FIELD parameters must be encoded in code page 500. Examples of 
data values that you might specify include TRIGGER string values and FIELD 
default and constant values. 


For more information about NLS in OnDemand, see the IBM Content Manager 
OnDemand for iSeries Common Server Planning and Installation Guide. 
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Chapter 3. Parameter reference 


This parameter reference assumes that you will use the ARSLOAD program to 
process your input files. When you use the ARSLOAD program to process input 
files, the PDF indexer ignores any values that you may provide for the INDEXDD, 
INPUTDD, MSGDD, OUTPUTDD, and PARMDD parameters. If you run the 
ARSPDOCI program from the command prompt or call it from a user-defined 
program, then you must provide values for the INPUTDD, OUTPUTDD, and 
PARMDD parameters and verify that the default values for the INDEXDD and 
MSGDD parameters are correct. 


COORDINATES 


Identifies the metrics used for x,y coordinates in the FIELD and TRIGGER 
parameters. 


Required? 
No 


Default Value 
IN 


Syntax 


COORDINATES=metric 


Options and values 


The metric can be: 
IN 
The coordinate metrics are specified in inches (the default). 
CM 
The coordinate metrics are specified in centimeters. 
MM 
The coordinate metrics are specified in millimeters. 


FIELD 


Identifies the location of index data and can provide default and constant index 
values. You must define at least one field. You can define up to 32 fields. You can 
define two types of fields: a trigger field, which is based on the location of a trigger 
string value and a constant field, which provides the actual index value that is 
stored in the database. 


Required? 
Yes 


Default Value 
<none> 


Trigger field syntax 


FIELDn=ul(x,y),1r(x,y),page[ (TRIGGER=n,BASE={0 | TRIGGER}, 
MASK=’field_mask’, DEFAULT=’value’)] 
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Options and values 
n 


The field parameter identifier. When adding a field parameter, use the next 
available number, beginning with 1 (one). 

ul(x,y) 

The coordinates for the upper left corner of the field string box. The field string 
box is the smallest rectangle that completely encloses the field string value (one 
or more words on the page). The PDF indexer must find the field string value 
inside the field string box. The supported range of values is 0 to 45, page width 
and length, in inches. 

Ir(xy) 

The coordinates for the lower right corner of the field string box. The field 
string box is the smallest rectangle that completely encloses the field string 
value (one or more words on the page). The PDF indexer must find the field 
string value inside the field string box. The supported range of values is 0 to 
45, page width and length, in inches. 


page 

The sheet number where the PDF indexer begins searching for the field, relative 
to a trigger or 0 (zero) for the same page as the trigger. If you specify BASE=0, 
the page value can be —16 to 16. If you specify BASE=TRIGGER, the page value 
must be 0 (zero), which is relative to the sheet number where the trigger string 
value is located. 


TRIGGER=n 


Identifies the trigger parameter used to locate the field. This is an optional 
keyword, but the default is TRIGGER1. Replace n with the number of a defined 
TRIGGER parameter. 


BASE=(0| TRIGGER} 


Determines whether the PDF indexer uses the upper left coordinates of the 
trigger string box to locate the field. Choose from 0 (zero) or TRIGGER. If 
BASE=0, the PDF indexer adds zero to the field string box coordinates. If 
BASE=TRIGGER, the PDF indexer adds the upper left coordinates of the 
location of the trigger string box to the coordinates provided for the field string 
box. This is an optional keyword, but the default is BASE=0. 


You should use BASE=0 if the field data always starts in a specific area on the 
page. You should use BASE=TRIGGER if the field is not always located in the 
same area on every page, but is always located a specific distance from a 
trigger. This capability is useful when the number of lines on a page varies, 
causing the location of field values to change. For example, given the following 
parameters: 

TRIGGER2=ul (4,4) ,1r(5,8),1,'Total' 

FIELD2=ul (1,0),1r(2,1),0, (TRIGGER=2, BASE=TRIGGER) 


The trigger string value can be found in a one by four inch rectangle. The PDF 
indexer always locates the field in a one inch box, one inch to the right of the 
location of the trigger string value. If the PDF indexer finds the trigger string 
value in location ul (4,4),11r(5,5), it attempts to find the field in location 
ul(5,4),1r(6,5). If the PDF indexer finds the trigger string value in location 
ul(4,6),1r(5,7), it attempts to find the field in location ul (5,6) ,1r(6,7). 


Note: Beginning with Version 5.2, a field that is based on the location of a 
trigger (BASE=TRIGGER) can be defined at any location on the page that 
contains the trigger. Previously, a field that was based on the location of 
a trigger had to be defined to the right and below the upper left point of 
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the trigger. With this change, the x or y values can be negative, so long 
as the resulting absolute field coordinates of the field string rectangle are 
still in the range of 0 <= x <= 45 and 0 <= y <= 45. The ul(x,y) and 
Ir(x,y) coordinates of the FIELD parameter are relative offsets from the 
ul(x,y) coordinates of the trigger. For example, suppose the field string 
rectangle is located at ul(1,1), Ir(2,2) which is an absolute location on 
the page. If the trigger string rectangle is located at ul1(5,5), Ir(7,7), 
then the field coordinates would be ul(-4,-4), Ir(-3,-3). 


MASK=’field_mask’ 


The pattern of symbols that the PDF indexer matches to data located in the 
field. When you define a field that includes a mask, an INDEX parameter based 
on the field cannot reference any other fields. Valid mask symbols can include: 


@ Matches alphabetic characters. For example: 
MASK=' @@BECEEEEEEEEEE' 


Causes the PDF indexer to match a 15-character alphabetic field, such 
as a name. 
# Matches numeric characters. For example: 


MASK=' #t#ttH HHT 


Causes the PDF indexer to match a 10-character numeric field, such as 
an account number. 


1 Matches any non-blank character. 
A Matches any non-blank character. 
% Matches the blank character and numeric characters. 


= Matches any character. 


Note: The string that you specify for the mask can contain any character. For 
example, given the following definitions: 


TRIGGER2=*,25, ‘ACCOUNT ' 
FIELD2=0, 38,11, (TRIGGER=2, BASE=0 ,MASK=' @000-####-#' ) 


The PDF indexer selects the field only if the data in the field columns 

contains an eleven-character string comprised of any letter, three zeros, a 

dash character, any four numbers, a dash character, and any number. 
DEFAULT=’value’ 


Defines the default index value, when there are no words within the 
coordinates provided for the field string box. 

For example, assume that an application program generates statements that 
contain an audit field. The contents of the field can be PASSED or FAILED. 
However, if a statement has not been audited, the application program does not 
generate a value. In that case, there are no words within the field string box. To 
store a default value in the database for unaudited records, define the field as 
follows: 


FIELD3=ul (8,1),1r(8.5,1.25) 51, (DEFAULT='NOT AUDITED') 


The PDF indexer assigns the index associated with FIELD3 the value NOT 
AUDITED, if the field string box is blank. 
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Examples 

The following field parameter causes the PDF indexer to locate the field at the 
coordinates provided for the field string box. The field is based on TRIGGER1 and 
located on the same page as TRIGGER1. We specify BASE=0 because the field 
string box always appears in a specific location on the page. 


TRIGGER1=ul (0,0), 1r(.75,.25),*, 'Page 0001! 
FIELD1=ul (1,1), 1r(3.25,1.25) ,0, (TRIGGER=1,BASE=0) 


Constant field syntax 
FIELDn=’ constant’ 


Options and values 
n 


The field parameter identifier. When adding a field parameter, use the next 
available number, beginning with 1 (one). 


‘constant’ 


The literal (constant) string value of the field. This is the index value stored in 
the database. The constant value can be 1 to 250 bytes in length. The PDF 
indexer does not validate the type or content of the constant. 


Examples 
The following field parameter causes the PDF indexer to store the same text string 
in each INDEX1 value it creates. 


FIELD1='000000000' 
INDEX1='acct',FIELD1 


The following field parameters cause the PDF indexer to concatenate a constant 
value with the index value extracted from the data. The PDF indexer concatenates 
the constant value specified in the FIELD1 parameter to each index value located 
using the FIELD2 parameter. The concatenated string value is stored in the 
database. In this example, the account number field in the data is 14 bytes in 
length. However, the account number in the database is 19 bytes in length. Use a 
constant field to concatenate a constant five byte prefix (0000-) to all account 
numbers extracted from the data. 

FIELD1='0000-' 

FIELD2=ul (2,2) ,1r(2.5,2.25) ,0, (TRIGGER=1, BASE=0) 

INDEX1='acct_num' ,FIELD1,FIELD2 


Related parameters 


INDEX parameter on page 
TRIGGER parameter on page 


FONTLIB 


Identifies the directory or directories in which fonts are stored. Specify any valid 
path. The PDF indexer searches for fonts in the order that the paths are listed. If a 
font is referenced in an input file but not embedded in the file, the PDF indexer 
attempts to locate the font in the directory or directories listed on the FONTLIB 
parameter. If the font is located, the PDF indexer adds it to the output file. If the 
font cannot be located, the Adobe viewing software displays the text using a 
substitute Type 1 font when the document is retrieved by a client program. 


Required? 
No 
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Default Value 
/QIBM/ProdData/OnDemand / Adobe/ fonts 


Syntax 


FONTLIB=pathlist 


Options and values 


The pathlist is a colon-separated string of one or more valid path names. For 
example: 


/QIBM/ProdData/OnDemand/Adobe/ fonts: /mycustom/fonts 


The PDF indexer searches the paths in the order in which they are specified. 
Delimit path names with the colon (:) character. 


INDEX 


Identifies the index name and the field or fields on which the index is based. You 
must specify at least one index parameter. You can specify up to 32 index 
parameters. When you create index parameters, we strongly encourage you to 
name the index the same as the application group database field name. 


Required? 
Yes 


Default Value 
<none> 


Syntax 


INDEXn=’name’,FIELDnn,...FIELDnn] 


Options and values 


n 


The index parameter identifier. When adding an index parameter, use the next 
available number, beginning with 1 (one). 


name’ 


Determines the index name associated with the actual index value. For 
example, assume INDEX1 is to contain account numbers. The string acct_num 
would be a meaningful index name. The index value of INDEX1 would be an 
actual account number, for example, 000123456789. 


The index name is a string from 1 to 250 bytes in length. We strongly encourage 
you to name the index the same as the application group database field name. 


FIELDnn 


The name of the field parameter or parameters that the PDF indexer uses to 
locate the index. You can specify a maximum of 32 field parameters. Separate 
the field parameter names with a comma. The total length of all the specified 
field parameters cannot exceed 250 bytes. 


Examples 


The following index parameter causes the PDF indexer to create group-level 
indexes for date index values (the PDF indexer supports group-level indexes only). 
When the index value changes, the PDF indexer closes the current group and 
begins a new group. 
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INDEX1='report_date',FIELD1 


The following index parameters cause the PDF indexer to create group-level 
indexes for customer name and account number index values. The PDF indexer 
closes the current group and begins a new group when either the customer name 
or the account number index value changes. 


INDEX1='name', FIELD1 
INDEX2='acct_num' ,FIELD2 


Related parameters 
FIELD parameter on page 


INDEXDD 


Determines the name or the full path name of the index object file. The PDF 
indexer writes indexing information to the index object file. If you specify the file 
name without a path, then the PDF indexer puts the index object file in the current 
directory. If you do not specify the INDEXDD parameter, then the PDF indexer 
writes indexing information to the file INDEX. 


Required? 
No 


Note: When you process input files with the ARSLOAD program, the PDF 
indexer ignores any value that you may supply for the INDEXDD 
parameter. If you process input files with the ARSPDOCI program, 
then verify the value of the INDEXDD parameter. 


Default Value 
INDEX 


Syntax 
INDEXDD=filename 


Options and values 


The filename is a valid filename or full path name. 


INDEXSTARTBY 


Determines the page number by which the PDF indexer must locate the first group 
(document) within the input file. The first group is identified when all of the 
triggers and fields are found. For example, with the following parameters: 

TRIGGER1=ul (4.72,1.28) ,1r(5.36,1.45) ,*, ‘ACCOUNT ' 

TRIGGER2=ul (6.11,1.43) ,1r(6.79,1.59),1, ‘SUMMARY! 

INDEX1='Account', FIELD1,FIELD2 

FIELD1=ul (6.11,1.29) .1r(6.63,1.45) ,2 

FIELD2=ul (6.69,1.29) ,1r(7.04,1.45) ,2 

INDEX2='Total ', FIELD3 

FIELD3=ul (6.11,1.43) ,1r(6.79,1.59) ,2 

INDEXSTARTBY=3 


The word ACCOUNT must be found on a page in the location described by 

TRIGGER1. The word SUMMARY must be found on a page following the page on 
which ACCOUNT was found, in the location specified by TRIGGER2. In addition, 
there must be one or more words found for fields FIELD1, FIELD2, and FIELD3 in 
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the locations specified by FIELD1, FIELD2, and FIELD3 which are located on a 
page that is two pages after the page on which TRIGGER1 was found. 


In the example, the first group in the file must start on either page one, page two, 
or page three. If TRIGGER1 is found on page one, then TRIGGER2 must be found 
on page two and FIELD1, FIELD2, and FIELD3 must be found on page three. 


The PDF indexer stops processing if it does not locate the first group by the 
specified page number. This parameter is optional, but the default is that the PDF 
indexer must locate the first group on the first page of the input file. This 
parameter is helpful if the input file contains header pages. For example, if the 
input file contains two header pages, you can specify a page number one greater 
than the number of header pages (INDEXSTARTBY=3) so that the PDF indexer will 
stop processing only if it does not locate the first group by the third page in the 
input data. 


Note: When you use INDEXSTARTBY to skip header pages, the PDF indexer does 
not copy non-indexed pages to the output file or store them in OnDemand. 
For example, if you specify INDEXSTARTBY=3 and the first group is found 
on page three, then pages one and two are not copied to the output file or 
stored in OnDemand. If you specify INDEXSTARTBY=3 and the first group 
is found on page two, then page one is not copied to the output file or 
stored in OnDemand. 

Required? 

No 


Default Value 
1 


Syntax 
INDEXSTARTBY=value 


Options and values 


The value is the page number by which the PDF indexer must locate the first group 
(document) in the input file. 


INPUTDD 


Identifies the name or the full path name of the PDF input file that the PDF 
indexer will process. 


Required? 
No 


Note: When you process input files with the ARSLOAD program, the PDF 
indexer ignores any value that you may supply for the INPUTDD 
parameter. If you process input files with the ARSPDOCI program, 
then you must specify a value for the INPUTDD parameter. 


Default Value 
<none> 


Syntax 
INPUTDD=name 
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Options and values 


The name is the file name or full path name of the input file. If you specify the file 
name without a path, the PDF indexer searches the current directory for the 
specified file. 


MSGDD 


Determines the name or the full path name of the file where the PDF indexer 
writes error messages. If you do not specify the MSGDD parameter, the PDF 
indexer writes messages to the display (interactive) or the joblog (batch). 


Required? 
No 


Note: When you process input files with the ARSLOAD program, the PDF 
indexer ignores any value that you may supply for the MSGDD 
parameter. If you process input files with the ARSPDOCI program, 
then verify the value of the MSGDD parameter. 


Default Value 
the display (interactive) or the joblog (batch), which are sometimes referred 
to as stderr (standard error) 


Syntax 
MSGDD=name 


Options and values 


The name is the file name or full path name where the PDF indexer writes error 
messages. If you specify the file name without a path, the PDF indexer places the 
error file in the current directory. 


OUTPUTDD 


Identifies the name or the full path name of the output file. 


Required? 
No 


Note: When you process input files with the ARSLOAD program, the PDF 
indexer ignores any value that you may supply for the OUTPUTDD 
parameter. If you process input files with the ARSPDOCI program, 
then you must specify a value for the OUTPUTDD parameter. 


Default Value 
<none> 


Syntax 
OUTPUTDD=name 


Options and values 


The name is the file name or full path name of the output file. If you specify the 
file name without a path, the PDF indexer puts the output file in the current 
directory. 
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PARMDD 


Identifies the name or the full path name of the file that contains the indexing 
parameters used to process the input data. 
Required? 

No 


Note: When you process input files with the ARSLOAD program, the PDF 
indexer ignores any value that you may supply for the PARMDD 
parameter. If you process input files with the ARSPDOCI program, 
then you must specify a value for the PARMDD parameter. 


Default Value 
<none> 


Syntax 
PARMDD=name 


Options and values 


The name is the file name or full path name of the file that contains the indexing 
parameters. If you specify the file name without a path, the PDF indexer searches 
for the file in the current directory. 


TEMPDIR 


Determines the name of the directory that the PDF indexer uses for temporary 
work space. 


Required? 
No 


Default Value 
/arstmp 


Syntax 
TEMPDIR=directory 


Options and values 


The directory is a valid directory name. 


TRIGGER 


Identifies locations and string values required to uniquely identify the beginning of 
a group and the locations and string values of fields used to define indexes. You 
must define at least one trigger and can define up to sixteen triggers. 


Required? 
Yes 


Default Value 
<none> 


Syntax 
TRIGGERn=ul(x,y),1r(x,y),page,’value’ 
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Options and values 


nN 


The trigger parameter identifier. When adding a trigger parameter, use the next 
available number, beginning with 1 (one). 

ul(x,y) 

The coordinates for the upper left corner of the trigger string box. The trigger 
string box is the smallest rectangle that completely encloses the trigger string 
value (one or more words on the page). The PDF indexer must find the trigger 
string value inside the trigger string box. The supported range of values is 0 to 
45, page width and length, in inches. 

Ir(y) 

The coordinates for the lower right corner of the trigger string box. The trigger 
string box is the smallest rectangle that completely encloses the trigger string 
value (one or more words on the page). The PDF indexer must find the trigger 
string value inside the trigger string box. The supported range of values are 0 
(zero) to 45, page width and length, in inches. 


page 
The page number in the input file on which the trigger string value must be 
located. 


— For TRIGGER1, the page value must be an asterisk (*), to specify that the 
trigger string value can be located on any page in the input file. The PDF 
indexer begins searching on the first page in the input file. The PDF indexer 
continues searching until the trigger string value is located, the 
INDEXSTARTBY value is reached, or the last page of the input file is 
searched, whichever occurs first. If the PDF indexer reaches the 
INDEXSTARTBY value or the last page and the trigger string value is not 
found, then an error occurs and indexing stops. 


— For all other triggers, the page value can be 0 (zero) to 16, relative to 
TRIGGER1. For example, the page value 0 (zero) means that the trigger is 
located on the same page as TRIGGER], the value 1 (one) means that the 
trigger is located on the page after the page that contains TRIGGER1; and so 
forth. For TRIGGER2 through TRIGGER16, the trigger string value can be a 
maximum of 16 pages from TRIGGERI1. 


‘value’ 


The actual string value the PDF indexer uses to match the input data. The 
string value is case sensitive. The value is one or more words that can be found 
on a page. 


Examples 


TRIGGER1 

The following TRIGGER1 parameter causes the PDF indexer to search the specified 
location on every page of the input data for the specified string. You must define 
TRIGGER1 and the page value for TRIGGER1 must be an asterisk. 


TRIGGER1=ul (0,0) ,1r(.75,.25),*,'Page 0001' 


Group triggers 

The following trigger parameter causes the PDF indexer to attempt to match the 
string value Account Number within the coordinates provided for the trigger string 
box. The trigger can be found on the same page as TRIGGER1. 


TRIGGER2=ul (1,2.25) ,1r(2,2.5),0, ‘Account Number' 
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The following trigger parameter causes the PDF indexer to attempt to match the 
string value Total within the coordinates provided for the trigger string box. In 
this example, we’ve defined a one by four inch trigger string box, because the 
vertical position of the trigger on the page may vary. For example, assume that the 
page contains account numbers and balances with a total for all of the accounts 
listed. There can be one or more accounts listed. The location of the total varies, 
depending on the number of accounts listed. The field parameter is based on the 
trigger so that the PDF indexer can locate the field regardless of the actual location 
of the trigger string value. The field is a one inch box that always begins one inch 
to the right of the trigger. After locating the trigger string value, the PDF indexer 
adds the upper left coordinates of the trigger string box to the coordinates 
provided for the field. The trigger can be found on the page following TRIGGER1. 
TRIGGER2=ul (4,4) ,1r(5,8),1,'Total' 

FIELD2=ul (1,0) ,1r(2,1) 0, (TRIGGER=2, BASE=TRIGGER) 


Related parameters 
The FIELD parameter on page 
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Chapter 4. Message reference 


Introduction 


The PDF indexer creates a message list at the end of each indexing run. A return 
code of 0 (zero) means that processing completed without any errors. 


The PDF indexer detects a number of error conditions that can be logically 
grouped into several categories: 


¢ Informational 


When the PDF indexer processes a file, it issues informational messages that 
allow the user to determine if the correct processing parameters have been 
specified. These messages can assist in providing an audit trail. 


¢ Warning 


The PDF indexer issues a warning message and a return code of 4 (four) when 
the fidelity of the document may be in question. 


e Error 


The PDF indexer issues an error message and return code of 8 (eight) or 16 
(sixteen) and terminates processing the current input file. Most error conditions 
detected by the PDF indexer fall into this category. The exact method of 
termination may vary. For certain severe errors, the PDF indexer may fail with a 
segment fault. This is generally the case when some system service fails. In other 
cases, the PDF indexer terminates with the appropriate error messages written 
either to the display (interactive) or the joblog (batch) (sometimes referred to as 
stderr (standard error)) or to a file. When the PDF indexer is invoked by the 
ARSLOAD data loading program, error messages are automatically written to 
the system log. If you execute the ARSPDOCI command, you can specify the 
name or the full path name of the file to contain processing messages with the 
MSGDD parameter. 


¢ Adobe Toolkit 
Messages generated by the Adobe Toolkit. 
¢ Internal Error 


The PDF indexer issues an error message and return code of 16 (sixteen) and 
terminates processing the current input file. 
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Messages 


ARS4900I_ _— Usage: arspdoci [parmdd filename] 
Version: version 
Coordinates: Metrics (units X and Y are specified in) 
Inches | Centimeters | Millimeters 
Fontlib: Font directory 
Inputdd: Input filename 
Msgdd: Message filename - default is stdout 
Outputdd: Output filename pattern 
TraceDD: Trace file - default is stderr 
Trace: What to trace - default is off 
API | WORDS | FCNS | INDEX | ALL 


Explanation: An incorrect parameter was specified for the command. 


User Response: Resubmit the command with the correct parameters. For more information about this command, 


please see|Chapter 5, “ARSPDOCI reference” on page 35 


ARS4901I_ ~~ parameter 
Explanation: This message is for your information only. 


User Response: No action is required. 


ARS4902I Number of input pages = pages 
Explanation: This message is for your information only. 


User Response: No action is required. 


ARS4903E — keyword keyword contains non-numeric identifier 


Explanation: The identifier for the specified keyword must be a number from 1 to 16 (TRIGGER parameter) or 1 to 
32 (INDEX or FIELD parameter). 


User Response: Correct the identifier and then resubmit the command. 


ARS4904E Error allocating bytes bytes memory 
Explanation: The PDF indexer was unable to allocate the requested amount of memory. 


User Response: Decrease the load on the system or increase the amount of memory available to the PDF indexer 
and then resubmit the command. 


ARS4905E — parameter parameter syntax incorrect 
Explanation: The syntax for the trigger, field, or index parameter is not correct. 


User Response: Correct the parameter and then resubmit the command. 


ARS4906E Unknown parameter: parameter 
Explanation: The specified string is not a valid PDF indexer parameter. 


User Response: Correct the parameter and then resubmit the command. 


ARS4907E Incorrect index file definition 
Explanation: The file specified for the INDEXDD file definition parameter is not a valid file name. 


User Response: Correct the file name specified on the file definition parameter and then resubmit the command. 
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ARS4908E Incorrect input file definition 
Explanation: The file specified for the INPUTDD file definition parameter is not a valid file name. 


User Response: Correct the file name specified on the file definition parameter and then resubmit the command. 


ARS4909E Incorrect output file definition 
Explanation: The file specified for the OUTPUTDD file definition parameter is not a valid file name. 


User Response: Correct the file name specified on the file definition parameter and then resubmit the command. 


ARS4910E Incomplete indexing parameters supplied 
Explanation: The current set of indexing parameters does not permit the PDF indexer to create index data. 


User Response: Correct the parameters and then resubmit the command. 


ARS4911E Error opening INDEX file file_name 
Explanation: The specified file does not exist or the file permissions do not allow the file to be opened. 


User Response: Verify that the file exists and verify that the file permissions allow the file to be opened. Then 
resubmit the command. 


ARS4912E _ Error opening Input file file_name 
Explanation: The specified file does not exist or the file permissions do not allow the file to be opened. 


User Response: Verify that the file exists and verify that the file permissions allow the file to be opened. Then 
resubmit the command. 


ARS4913E ‘Error opening Parameter file file_name 
Explanation: The specified file does not exist or the file permissions do not allow the file to be opened. 


User Response: Verify that the file exists and verify that the file permissions allow the file to be opened. Then 
resubmit the command. 


ARS4914E __ Trigger(s) not found by page page 


Explanation: The PDF indexer did not find a trigger by the specified page number. The INDEXSTARTBY parameter 
determines the page number by which the PDF indexer must find a trigger and begin indexing. 


User Response: Verify the page number that is specified on the INDEXSTARTBY parameter. If the page number is 
correct, verify the TRIGGER parameters. Then resubmit the command. 


ARS4915E __ Field(s) not found by page page 


Explanation: The PDF indexer did not find a field by the specified page number. The INDEXSTARTBY parameter 
determines the page number by which the PDF indexer must find a trigger and begin indexing for a field. 


User Response: Verify the page number that is specified on the INDEXSTARTBY parameter. If the page number is 
correct, verify the FIELD parameters. Then resubmit the command. 


ARS4916E_ —_— Failed Adobe Toolkit Initialization rc=rcError string : string 
Explanation: The Adobe toolkit returned an error. 


User Response: Verify the directories that are specified on the FONTLIB and TEMPDIR parameters. Verify the 
directory permissions. Verify that the directories named on the FONTLIB parameter provide access to the fonts that 
are required by the PDF indexer. Verify that the directory named on the TEMPDIR parameter contains sufficient free 
space to process the input file. If the problem persists, contact your IBM Service Representative. 
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ARS4917E Create of new Document Segment failed 
Explanation: The Adobe toolkit returned an error when trying to create a new document segment. 


User Response: Verify the directories that are specified on the FONTLIB and TEMPDIR parameters. Verify the files 
and directories that are named on the INPUTDD and OUTPUTDD parameters. Verify the file and directory 
permissions. Verify that the directories that are named on the FONTLIB parameter provide access to the fonts that are 
required by the PDF indexer. Verify that the directory that is named on the TEMPDIR and OUTPUTDD parameters 
contains sufficient free space to process the input file. If the problem persists, contact your IBM Service 
Representative. 


ARS4918E _ Page extraction failed! 
Explanation: The Adobe toolkit returned an error when trying to extract pages for a new segment. 


User Response: Verify the directories that are specified on the FONTLIB and TEMPDIR parameters. Verify the files 
and directories that are named on the INPUTDD and OUTPUTDD parameters. Verify the file and directory 
permissions. Verify that the directories that are named on the FONTLIB parameter provide access to the fonts that are 
required by the PDF indexer. Verify that the directory that is named on the TEMPDIR and OUTPUTDD parameters 
contains sufficient free space to process the input file. If the problem persists, contact your IBM Service 
Representative. 


ARS4919E —_- Word search or extraction error 
Explanation: The Adobe toolkit returned an error while searching the PDF document. 


User Response: Verify the directories that are specified on the FONTLIB and TEMPDIR parameters. Verify the 
directory permissions. Verify that the directories that are named on the FONTLIB parameter provide access to the 
fonts that are required by the PDF indexer. Verify that the directory named on the TEMPDIR parameter contains 
sufficient free space to process the input file. If the problem persists, contact your IBM Service Representative. 


ARS4920E Error during Distil rc =rc Error string :string Check the Distiller messages 
Explanation: The Acrobat Distiller returned an error while trying to distill the input file. 


User Response: Use the Distiller output messages to determine the cause and resolution of the error. After 
correcting the error, resubmit the command. 


ARS4921E ‘The Input file contains an unsupported data type 
Explanation: The input file does not contain Postscript or PDF data. 


User Response: Verify that the correct file is named on the INPUTDD parameter. Verify that the file named on the 
INPUTDD parameter contains PostScript or PDF data. Then resubmit the command. 


ARS4922I ARSPDOCI completed code rc 
Explanation: The PDF indexer completed processing the input data with the completion code listed. 


User Response: No action is required. 


ARS4923E action version rc string 
Explanation: A message that displays the product version and release. 


User Response: No action is required. 


ARS4924E Error executing action API re =rcError string: string 
Explanation: The Adobe toolkit returned an error. 


User Response: Verify the directories that are specified on the FONTLIB and TEMPDIR parameters. Verify the 
directory permissions. Verify that the directories named on the FONTLIB parameter provide access to the fonts that 
are required by the PDF indexer. Verify that the directory named on the TEMPDIR parameter contains sufficient free 
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space to process the input file. If the problem persists, contact your IBM Service Representative. 


ARS4925I_ —- Usage: arspdump -f filename [-F font_dir] [-h] [-o output file] 
[-p page number] [-t temp dir] 
Version: version 
-f: PDF file name 
-F; Font directory 
-h: This message 
-o: Output file (default is stdout) 
-p: Specifies the page number (default is all pages) 
-t: Temp directory 


Explanation: An incorrect parameter was specified for the command. 


User Response: Resubmit the command with the correct parameters. For more information about this command, 
please {Chapter 6, “ARSPDUMP reference” on page 37 


ARS4926I __ ------------- Page page ------------- 
Explanation: This message is for your information only. 


User Response: No action is required. 


ARS4927I_ —------------- Rotated 90 degrees --------- 
Explanation: This message is for your information only. 


User Response: No action is required. 


ARS4928I_ _—_------------- Rotated 180 degrees --------- 
Explanation: This message is for your information only. 


User Response: No action is required. 


ARS4929I_ _—_------------- Rotated 270 degrees --------- 
Explanation: This message is for your information only. 


User Response: No action is required. 


ARS4930I = _WordFinder version: version 
Explanation: This message is for your information only. 


User Response: No action is required. 


ARS4931I_ Number of Pages = page 
Explanation: This message is for your information only. 


User Response: No action is required. 
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Chapter 5. ARSPDOCI reference 


Purpose 


Generate index data for a PDF file. 


Syntax 


Note: The following syntax should be used only when you run the ARSPDOCI 
program from the command line or call it from a user-defined program. 


>>—ARSPDOCI > 


FIELDn=spec 


L cooRDINATES=metric— L_rontLip=pathlist— 


>—INDEXn=spec 


Ly NDEXDD=fi Deine! UTI NDEXSTARTBY=pageNlumber— 


>—INPUTDD=fi leName OUTPUTDD=fi l eName—PARMDD=f i 1 eName——_> 
_MSGDD=fi leName— 


> 7 TRIGGERn=spec ie 
'_TEMPDIR=fileSystem 


Description 


The ARSPDOCI program can be used to index a PDF file. The ARSLOAD program 
automatically calls the ARSPDOCI program if the input data type is PDF and the 
indexer is PDF. If you need to index a PDF file and you do not want to use the 
ARSLOAD program to process the file, then you can run the ARSPDOCI program 
from the command line or call it from a program. 


Parameters 


IFS location 


Refer to|Chapter 3, “Parameter reference” on page 17} for details about the 


parameters that you can specify when you run the ARSPDOCI program from the 
command line or a user-defined program. 


/ust/bin/arspdoci 
The executable program. 
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Chapter 6. ARSPDUMP reference 


Purpose 


Print the locations of text strings on a page. 


Syntax 


>>—ARSPDUMP—- f—inputFile 


'—-F—fontFile | -h | -0 putouerile 


>—-p—sheetNumber 


'_-t—tempDi al 


Description 


The ARSPDUMP program can be used to identify the locations of text strings on a 
page in a PDF file. You can use this program to help define triggers and fields. 
When you define triggers and fields, you must specify the location of the string 
value used to locate the trigger or field as x and y pairs in a coordinate system 
imposed on the page. For each string value, you must identify the upper left and 
lower right position on the page. The output of the ARSPDUMP program contains 
a list of the text strings on the page and the coordinates for each string. If a font is 
referenced in a PDF file, but not embedded, then the ARSPDUMP program 
attempts to find the font using information provided with the -F flag. If the 
ARSPDUMP program does not find the font, then it uses a substitute Adobe Type 
1 font. 


Parameters 


-f inputFile 
The file name or full path name of the PDF file to process. 


-F fontDir 
Identifies directories in which fonts are stored. Specify any valid path. Use 
the colon (:) character to separate path names. The ARSPDUMP program 
searches the paths in the order in which they are specified. If you do not 
specify this flag and name a font directory, then the ARSPDUMP program 
attempts to locate fonts in the /QIBM/ProdData/OnDemand/Adobe/fonts 
directory. 


-h Lists the parameters and their descriptions for the ARSPDUMP program. 


-o outputFile 
The file name or full path name of the file into which the ARSPDUMP 
program writes output messages. If you do not specify this flag and name 
a file, then the ARSPDUMP program writes output to the display 
(interactive) or the joblog (batch). 


-p sheetNumber 
The number of the page in the PDF file that you want the ARSPDUMP 
program to process. This is the page that contains the text strings that you 
want to use to define triggers and fields. The sheet number is the order of 
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the page as it appears in the file, beginning with the number 1 (one), for 
the first page in the file. Contrast with page identifier, which is 
user-defined information that identifies each page (for example, iv, 5, and 
17-3). 


-t tempDir 
Identifies the directory that the ARSPDUMP program uses for temporary 
work space. Specify any valid directory name. If you do not specify this 
flag and name a directory, then the ARSPDUMP program uses the /arstmp 
directory for temporary work space. 


Examples 


The following example shows how to invoke the ARSPDUMP program within 
QSHELL to print the strings and locations of text found on page number three of 
sample.pdf to sample.out: 


arspdump -f sample.pdf -o sample.out -p 3 


See the IBM Content Manager OnDemand for iSeries Common Server Administration 
Guide for more information about running ARSPDUMP using QSHELL. 


IFS location 


/usr/bin/arspdump 
The executable program. 
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Part 3. Generic indexer reference 


This part of the book provides information about the OnDemand generic indexer. 
You can use the generic indexer to specify index data for other types of input files 
that you want to store in OnDemand. 
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Chapter 7. Overview 


OnDemand provides the generic indexer to allow you to specify indexing 
information for input data that you cannot or do not want to index with the 
OS/400 Indexer or the PDF Indexer. For example, suppose that you want to load 
documents into OnDemand that were created with a word processor. The 
documents can be stored in OnDemand in the same format in which they were 
created. The documents can be retrieved from OnDemand and viewed with the 
word processor. However, because the documents do not contain SCS, AFP, line 
data, or PDF, you cannot index them with the standard OnDemand indexers. You 
can specify index information about the documents to the generic indexer and load 
the documents into OnDemand. Users can then search for and retrieve the 
documents with the OnDemand client program. 


To use the generic indexer, you must specify index data for each input file or 
document that you want to store in and retrieve from OnDemand. You specify the 
index data in a parameter file. The parameter file contains the index fields, index 
values, and information about the input files or documents to process. The generic 
indexer retrieves the index data from the parameter file and generates the index 
information that is loaded into the database. OnDemand creates one index record 
for each input file (or document) that you specify in the parameter file. The index 
record contains the index values that uniquely identify a file or document in 
OnDemand. 


The generic indexer supports group-level indexes. Group indexes are stored in the 
database and used to search for documents. You must specify one set of group 
indexes for each file or document that you want to process with the generic 
indexer. 


You use the ARSLOAD program (along with your generic indexer input files) to 
load data on the system. The ARSLOAD program will invoke the generic indexer 
to process the parameter file and generate the index data. The ARSLOAD program 
will then add the index information to the database and load the input files or 
documents specified in the parameter file into OnDemand. 


Processing AFP data 


You can specify a parameter file for input files that contain AFP resources and 
documents and process them with the generic indexer. However, when you specify 
the parameter file: 


* The starting location (byte offset) of the first AFP document in the input file 
should always be 0 (zero), even though the actual starting location is not zero 
when AFP resources are contained in the input. AFP resources are always 
located at the beginning of an input file. The actual starting location of the first 
document in the input file is zero plus the number of bytes that comprise the 
resources. However, to process AFP documents with the generic indexer, you do 
not need to calculate the number of bytes taken by the resources. 


* The starting locations of the other documents in the input file should be 
calculated using the length of and offset from the previous document in the 
input file. 
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The generic indexer will determine where the AFP resources end in the file and 
process the documents using the offsets and lengths that you provide, relative to 
where the resources end. 
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Chapter 8. Specifying the parameter file 


The input to the generic indexer program is the input file or files that you want to 

store in OnDemand and a parameter file that contains the indexing information for 
the files or documents. To use the generic indexer, you must create a parameter file 
that contains the indexing information for the files or documents that you want to 

process. This section describes the parameter file for the generic indexer. 


There are three types of statements that you can specify in a parameter file: 
* Comments. You can place a comment line anywhere in the parameter file. 


* Code page. You can specify one and only one code page line. If you specify a 
code page line, you must do so at the beginning of the parameter file, before 
you define any of your groups. 


* Groups. A group represents a document that you want to index. Each group 
contains the application group field names and their index values, the location 
of the document in the input file, the number of bytes (characters) that make up 
the document, and the name of the input file that contains the document. 


Note: The parameter names in the generic index file are case sensitive and must 
appear in uppercase. For example, GROUP_FIELD_NAME:account is valid, while 
group_field_name: account is not. 


CODEPAGE: 
Specifies the code page of the input data. You can specify one and only one code 
page. The CODEPAGE: line must appear before you specify any of the groups. 
The CODEPAGE: line is required. 
Syntax 


CODEPAGE:cpgid 


Options and values 


The character string CODEPAGE: identifies the line as specifying the code page of 
the input data. The string cpgid can be any valid code page, a three to five 
character identifier of an IBM-registered or user-defined code page. 


Example 
The following illustrates how to specify a code page of 37 for the input data: 
CODEPAGE: 37 


COMMENT: 
Specifies a comment line. You can place comment lines anywhere in the parameter 
file. 
Syntax 


COMMENT: text on a single line 
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Options and values 


The character string COMMENT: identifies the line as containing a comment. 
Everything after the colon character to the end of the line is ignored. 


Example 
The following are examples of comment lines: 


COMMENT: 
COMMENT: this is a comment 


GROUP_FIELD_NAME: 


Specifies the name of an application group field. Each group that you specify in 
the parameter file must contain one GROUP_FIELD_NAME: line for each 
application group field. (The application group is where you store a file or 
document in OnDemand. You specify the name of the application group to the 
ARSLOAD program.) OnDemand supports up to 32 fields per application group. If 
the field names that you specify are different than the application group field 
names, then you must map the field names that you specify to the application 
group field names on the application Load Information page. 


Specify a pair of GROUP_FIELD_NAME: and GROUP_FIELD_VALUE: lines for 
each application group field. For example, if the application group contains two 
fields, then each group that you specify in the parameter file must contain two 
pairs of GROUP_FIELD_NAME: and GROUP_FIELD_VALUE: lines. The 
following is an example of a group with two application group fields: 


GROUP_FIELD_NAME:rdate 
GROUP_ FIELD VALUE:05/31/00 
GROUP_FIELD NAME:studentID 
GROUP_FIELD_VALUE:0012345678 


The group lines must appear after the CODEPAGE: line. 


Syntax 
GROUP_FIELD_NAME:app1grpFieldName 


Options and values 
The character string GROUP_FIELD_NAME: identifies the line as containing the 
name of an application group field. The string applgrpFieldName specifies the 
name of an application group field. OnDemand ignores the case of application 
group field names. 


Example 
The following shows some examples of application group field names: 


GROUP_FIELD_NAME:rdate 
GROUP_FIELD_NAME:studentID 
GROUP_FIELD_NAME:account# 


GROUP_FIELD_VALUE: 


Specifies an index value for an application group field. Each group that you 
specify in the parameter file must contain one GROUP_FIELD_VALUE: line for 
each application group field. (The application group is where you store a file or 
document in OnDemand. You specify the name of the application group to the 
ARSLOAD program.) OnDemand supports up to 32 fields per application group. 
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The GROUP_FIELD_VALUE: line must follow the GROUP_FIELD_NAME: line 
for which you are specifying the index value. 


Specify a pair of GROUP_FIELD_NAME: and GROUP_FIELD_VALUE: lines for 
each application group field. For example, if the application group contains two 
fields, then each group that you specify in the parameter file must contain two 
pairs of GROUP_FIELD_NAME: and GROUP_FIELD_VALUE: lines. The 
following is an example of a group with two application group fields: 


GROUP_FIELD_NAME:rdate 
GROUP_FIELD VALUE:05/31/00 
GROUP_FIELD NAME:studentID 
GROUP_FIELD_VALUE:0012345678 


The group lines must appear after the CODEPAGE: line. 


Syntax 
GROUP_FIELD_VALUE:value 


Options and values 


The character string GROUP_FIELD_VALUE: identifies the line as containing an 
index value for an application group field. The string value specifies the actual 
index value for the field. 


Example 
The following shows some examples of index values: 


GROUP_FIELD_VALUE:05/31/00 
GROUP_FIELD VALUE:0012345678 
GROUP_FIELD VALUE: 0000-1111-2222-3333 


GROUP_FILENAME: 


The file name or full path name of the input file. If you do not specify a path, then 
the generic indexer searches the current directory for the specified file. 


Each group that you specify in the parameter file must contain one 
GROUP_FILENAME: line. The GROUP_FILENAME: line must follow the 
GROUP_FIELD_NAME: and GROUP_FIELD_VALUE: lines that comprise a 
group. The following is an example of a group: 


GROUP_FIELD_NAME:rdate 
GROUP_FIELD_VALUE:05/31/00 
GROUP_FIELD_NAME:studentID 
GROUP_FIELD_VALUE:0012345678 
GROUP_OFFSET:0 

GROUP_LENGTH:0 

GROUP_FILENAME: /tmp/statements.out 


If the GROUP_FILENAME line is blank (null), then the generic indexer uses the 
value of the GROUP_FILENAME line from the previous group to process the 
current group. In the following example, the input data for the second and third 
groups is retrieved from the input file that is specified for the first group: 

GROUP_FIELD_NAME:rdate 

GROUP_FIELD VALUE:05/31/00 

GROUP_FIELD_NAME:studentID 

GROUP_FIELD_VALUE:0012345678 

GROUP_OFFSET:0 

GROUP_LENGTH: 8124 

GROUP_FILENAME: /tmp/statements. out 
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GROUP_FIELD NAME: rdate 
GROUP_FIELD VALUE:06/30/00 
GROUP_FIELD_NAME:studentID 
GROUP_FIELD_VALUE:0012345678 
GROUP_OFFSET:8124 
GROUP_LENGTH:8124 
GROUP_FILENAME: 
GROUP_FIELD NAME: rdate 
GROUP_FIELD VALUE:07/31/00 
GROUP_FIELD_NAME:studentID 
GROUP_FIELD_VALUE:0012345678 
GROUP_OFFSET: 16248 
GROUP_LENGTH:8124 
GROUP_FILENAME: 


If the first GROUP_FILENAME line in the parameter file is blank, then you must 
specify the name of the input file when you run the ARSLOAD program. 


The group lines must appear after the CODEPAGE: line. 


Syntax 
GROUP_FILENAME:fi1eName 


Options and values 


The character string GROUP_FILENAME: identifies the line as containing the 
input file to process. The string fileName specifies the file name or full path name 
of the input file. 


Example 
The following are valid file name lines: 


GROUP_FILENAME: /tmp/statements 
GROUP_FILENAME:D:\ARSTMP\statements 
GROUP_FILENAME: statements 
GROUP_FILENAME: 


GROUP_LENGTH: 


Specifies the number of contiguous bytes (characters) that comprise the document 
to be indexed. Specify 0 (zero) to indicate the entire input file or the remainder of 
the input file. Each group that you specify in the parameter file must contain one 
GROUP_LENGTH: line. The GROUP_LENGTH: line must follow the 
GROUP_FIELD_NAME: and GROUP_FIELD_VALUE: lines that comprise a 
group. For example: 

GROUP_FIELD_NAME:rdate 

GROUP_FIELD_VALUE:05/31/00 

GROUP_FIELD_NAME:studentID 

GROUP_FIELD_VALUE: 0012345678 

GROUP_OFFSET:0 

GROUP_LENGTH:0 


The group lines must appear after the CODEPAGE: line. 


Syntax 
GROUP_LENGTH:value 
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Options and values 


The character string GROUP_LENGTH: identifies the line as containing the byte 
count of the data to be indexed. The string value specifies the actual byte count. 
The default value is 0 (zero), for the entire (or remainder) of the file. 


Example 
The following illustrates how to specify length values: 


GROUP_LENGTH:0 
GROUP_LENGTH: 8124 


GROUP_OFFSET: 


Specifies the starting location (byte offset) into the input file of the data to be 
indexed. Specify 0 (zero) for the first byte (the beginning) of the file. (If_you_are 
processing AFP documents and resources with the generic indexer, sect rroceseing| 
[AFP data” on page 41) Each group that you specify in the parameter file must 
contain one GROUP_OFFSET: line. The GROUP_OFFSET: line must follow the 
GROUP_FIELD_NAME: and GROUP_FIELD_VALUE: lines that comprise a 
group. For example: 


GROUP_FIELD_NAME:rdate 
GROUP_FIELD VALUE:05/31/00 
GROUP FIELD _NAME:studentID 
GROUP_FIELD VALUE:0012345678 
GROUP_OFFSET:0 


The group lines must appear after the CODEPAGE: line. 


Syntax 
GROUP_OFFSET:value 


Options and values 


The character string GROUP_OFFSET: identifies the line as containing the byte 
offset (location) of the data to be indexed. The string value specifies the actual byte 
offset. Specify 0 (zero), to indicate the beginning of the file. 


Example 
The following illustrates offset values for three documents from the same input 
file. The documents are 8 KB in length. 


GROUP_OFFSET:0 
GROUP_OFFSET:8124 
GROUP_OFFSET: 16248 
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Chapter 9. Parameter file examples 


The following example shows how to specify indexing information for three 
groups (documents). Each document will be indexed using two fields. The input 
data for each document is contained in a different input file. 


COMMENT: 

COMMENT: Generic Indexer Example 1 
COMMENT: Different input file for each document 
COMMENT: 

COMMENT: Specify code page of the index data 
CODEPAGE: 37 

COMMENT: Document #1 

COMMENT: Index field #1 
GROUP_FIELD_NAME:rdate 
GROUP_FIELD_VALUE:07/13/99 

COMMENT: Index field #2 
GROUP_FIELD_NAME:studentID 
GROUP_FIELD_VALUE: 0012345678 

COMMENT: document data starts at beginning of file 
GROUP_OFFSET:0 

COMMENT: document data goes to end of file 
GROUP_LENGTH:0 

GROUP_FILENAME: /arstmp/statement7.out 
COMMENT: Document #2 

COMMENT: Index field #1 
GROUP_FIELD_NAME:rdate 

GROUP_FIELD_VALUE: 08/13/99 

COMMENT: Index field #2 
GROUP_FIELD_NAME:studentID 
GROUP_FIELD_VALUE: 0012345678 
GROUP_OFFSET:0 

GROUP_LENGTH:0 

GROUP_FILENAME: /arstmp/statement8. out 
COMMENT: Document #3 

COMMENT: Index field #1 
GROUP_FIELD_NAME:rdate 
GROUP_FIELD_VALUE:09/13/99 

COMMENT: Index field #2 
GROUP_FIELD_NAME:studentID 
GROUP_FIELD_VALUE: 0012345678 
GROUP_OFFSET:0 

GROUP_LENGTH:0 

GROUP_FILENAME: /arstmp/statement9. out 
COMMENT: 

COMMENT: End Generic Indexer Example 1 
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The following example shows how to specify indexing information for three 


groups (do 
data for all 


COM 
COM 
COM 
COM 
COM 
COD 
COM 
GRO 
GRO 
GRO 
GRO 
COM 
GRO 
COM 
GRO 
GRO 
COM 
GRO 
GRO 
GRO 
GRO 
COM 
GRO 
COM 
GRO 
COM 
GRO 
COM 
GRO 
GRO 
GRO 
GRO 
COM 
GRO 
COM 
GRO 
COM 
GRO 
COM 
COM 


cuments). Each document will be indexed using two fields. The input 
of the documents is contained in the same input file. 


MENT: 

MENT: Generic Indexer Example 2 

MENT: One input file contains all documents 
MENT: 

MENT: Specify code page of the index data 
EPAGE: 37 

MENT: Document #1 

UP_FIELD_NAME:rdate 
UP_FIELD_VALUE:07/13/99 
UP_FIELD_NAME:studentID 

UP_FIELD_VALUE: 0012345678 

MENT: first document starts at beginning of file (byte 0) 
UP_OFFSET:0 

MENT: document length 8124 bytes 
UP_LENGTH:8124 

UP_FILENAME: /arstmp/accounting.student information.loan.out 
MENT: Document #2 

UP_FIELD_NAME:rdate 
UP_FIELD_VALUE:08/13/99 
UP_FIELD_NAME:studentID 

UP_FIELD_VALUE: 0012345678 

MENT: second document starts at byte 8124 
UP_OFFSET:8124 

MENT: document length 8124 bytes 
UP_LENGTH:8124 

MENT: use prior GROUP_FILENAME: 
UP_FILENAME: 

MENT: Document #3 

UP_FIELD_NAME:rdate 
UP_FIELD_VALUE:09/13/99 
UP_FIELD_NAME:studentID 

UP_FIELD_VALUE: 0012345678 

MENT: third document starts at byte 16248 
UP_OFFSET: 16248 

MENT: document length 8124 bytes 
UP_LENGTH:8124 

MENT: use prior GROUP_FILENAME: 
UP_FILENAME: 

MENT: 

MENT: End Generic Indexer Example 2 
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Appendix. Notices 


This information was developed for products and services offered in the U.S.A. 


IBM may not offer the products, services, or features discussed in this document in 
other countries. Consult your local IBM representative for information on the 
products and services currently available in your area. Any reference to an IBM 
product, program, or service is not intended to state or imply that only the IBM 
product, program, or service may be used. Any functionally equivalent product, 
program, or service that does not infringe on any IBM intellectual property right 
may be used instead. However, it is the user’s responsibility to evaluate and verify 
the operation of any non-IBM product, program, or service. 


IBM may have patents or pending patent applications covering subject matter 
described in this document. The furnishing of this document does not give you 
any license to these patents. You can send license inquiries, in writing, to: 


IBM Director of Licensing 
IBM Corporation 

North Castle Drive 
Armonk, NY 10504-1785 
U.S.A. 


For license inquiries regarding double-byte (DBCS) information, contact the IBM 
Intellectual Property Department in your country or send inquiries, in writing, to: 


IBM World Trade Asia Corporation 
Licensing 

2-31 Roppongi 3-chome, Minato-ku 
Tokyo 106, Japan 


The following paragraph does not apply to the United Kingdom or any other 
country where such provisions are inconsistent with local law: 
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS 
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER 
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS 
FOR A PARTICULAR PURPOSE. 

Some states do not allow disclaimer of express or implied warranties in certain 
transactions, therefore, this statement may not apply to you. 


This information could include technical inaccuracies or typographical errors. 
Changes are periodically made to the information herein; these changes will be 
incorporated in new editions of the publication. IBM may make improvements 
and/or changes in the product(s) and/or the program(s) described in this 
publication at any time without notice. 


Any references in this information to non-IBM Web sites are provided for 
convenience only and do not in any manner serve as an endorsement of those Web 
sites. The materials at those Web sites are not part of the materials for this IBM 
product and use of those Web sites is at your own risk. 
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IBM may use or distribute any of the information you supply in any way it 
believes appropriate without incurring any obligation to you. 


Licensees of this program who wish to have information about it for the purpose 
of enabling: (i) the exchange of information between independently created 
programs and other programs (including this one) and (ii) the mutual use of the 
information which has been exchanged, should contact: 


IBM Corporation 

Software Interoperability Coordinator 
3605 Highway 52 N 

Rochester, MN 55901-7829 

U.S.A. 


Such information may be available, subject to appropriate terms and conditions, 
including in some cases, payment of a fee. 


The licensed program described in this information and all licensed material 
available for it are provided by IBM under terms of the IBM Customer Agreement, 
IBM International Program License Agreement, or any equivalent agreement 
between us. 


Any performance data contained herein was determined in a controlled 
environment. Therefore, results obtained in other operating environments may 
vary significantly. Some measurements may have been made on development-level 
systems and there is no guarantee that these measurements will be the same on 
generally available systems. Furthermore, some measurements may have been 
estimated through extrapolation. Actual results may vary. Users of this document 
should verify the applicable data for their specific environment. 


Information concerning non-IBM products was obtained from the suppliers of 
those products, their published announcements or other publicly available sources. 
IBM has not tested those products and cannot confirm the accuracy of 
performance, compatibility or any other claims related to non-IBM products. 
Questions on the capabilities of non-IBM products should be addressed to the 
suppliers of those products. 


This information contains examples of data and reports used in daily business 
operations. To illustrate them as completely as possible, the examples include the 
names of individuals, companies, brands, and products. All of these names are 
fictitious and any similarity to the names and addresses used by an actual business 
enterprise is entirely coincidental. 


COPYRIGHT LICENSE: 

This information contains sample application programs in source language, which 
illustrates programming techniques on various operating platforms. You may copy, 
modify, and distribute these sample programs in any form without payment to 
IBM, for the purposes of developing, using, marketing or distributing application 
programs conforming to the application programming interface for the operating 
platform for which the sample programs are written. These examples have not 
been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or 
imply reliability, serviceability, or function of these programs. You may copy, 
modify, and distribute these sample programs in any form without payment to 
IBM for the purposes of developing, using, marketing, or distributing application 
programs conforming to IBM’s application programming interfaces. 
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If you are viewing this information softcopy, the photographs and color 
illustrations may not appear. 


Trademarks 


Advanced Function Presentation, AFP, IBM, iSeries, Operating System /400, 
OS/400, and Redbooks are trademarks of International Business Machines 
Corporation in the United States, other countries, or both. 


Adobe, the Adobe logo, Acrobat, and the Acrobat logo are trademarks of Adobe 
Systems Incorporated, which may be registered in certain jurisdictions. 


Java and all Java-based trademarks and logos are trademarks of Sun Microsystems, 
Inc. in the United States, other countries, or both. 


Other company, product, and service names may be trademarks or service marks 
of others. 
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